Using Other Learner Corpora in the 2013 NLI Shared Task
نویسندگان
چکیده
Our efforts in the 2013 NLI shared task focused on the potential benefits of external corpora. We show that including training data from multiple corpora is highly effective at robust, cross-corpus NLI (i.e. open-training task 1), particularly when some form of domain adaptation is also applied. This method can also be used to boost performance even when training data from the same corpus is available (i.e. open-training task 2). However, in the closed-training task, despite testing a number of new features, we did not see much improvement on a simple model based on earlier work.
منابع مشابه
Learning with Learner Corpora: using the TLE for Native Language Identification
This study investigates the usefulness of the Treebank of Learner English (TLE) when applied to the task of Native Language Identification (NLI). The TLE is effectively a parallel corpus of Standard/Learner English, as there are two versions; one based on original learner essays, and the other an error-corrected version. We use the corpus to explore how useful a parser trained on ungrammatical ...
متن کاملRecognizing English Learners' Native Language from Their Writings
Native Language Identification (NLI), which tries to identify the native language (L1) of a second language learner based on their writings, is helpful for advancing second language learning and authorship profiling in forensic linguistics. With the availability of relevant data resources, much work has been done to explore the native language of a foreign language learner. In this report, we p...
متن کاملNAIST at the NLI 2013 Shared Task
This paper describes the Nara Institute of Science and Technology (NAIST) native language identification (NLI) system in the NLI 2013 Shared Task. We apply feature selection using a measure based on frequency for the closed track and try Capping and Sampling data methods for the open tracks. Our system ranked ninth in the closed track, third in open track 1 and fourth in open track 2.
متن کاملConstrained Grammatical Error Correction using Statistical Machine Translation
This paper describes our use of phrasebased statistical machine translation (PBSMT) for the automatic correction of errors in learner text in our submission to the CoNLL 2013 Shared Task on Grammatical Error Correction. Since the limited training data provided for the task was insufficient for training an effective SMT system, we also explored alternative ways of generating pairs of incorrect a...
متن کاملFeature Extraction for Native Language Identification Using Language Modeling
This paper reports on the task of Native Language Identification (NLI). We developed a machine learning system to identify the native language of authors of English texts written by non-native English speakers. Our system is based on the language modeling approach and employs crossentropy scores as features for supervised learning, which leads to a significantly reduced feature space. Our metho...
متن کامل